Vision-based Active Speaker Detection in Multiparty Interactions

نویسندگان

Kalin Stefanov

Jonas Beskow

Giampiero Salvi

چکیده

This paper presents a supervised learning method for automatic visual detection of the active speaker in multiparty interactions. The presented detectors are built using a multimodal multiparty interaction dataset previously recorded with the purpose to explore patterns in the focus of visual attention of humans. Three different conditions are included: two humans involved in taskbased interaction with a robot; the same two humans involved in task-based interaction where the robot is replaced by a third human, and a free three-party human interaction. The paper also presents an evaluation of the active speaker detection method in a speaker dependent experiment showing that the method achieves good accuracy rates in a fairly unconstrained scenario using only image data as input. The main goal of the presented method is to provide real-time detection of the active speaker within a broader framework implemented on a robot and used to generate natural focus of visual attention behavior during multiparty human-robot interactions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards Speaker Detection using Lips Movements for Human-Machine Multiparty Dialogue

This paper explores the use of lips movements for the purpose of speaker and voice activity detection, a task that is essential in multi-modal multiparty human machine dialogue. The task aims at detecting who and when someone is speaking out of a set of persons. A multiparty dialogue consisting of 4 speakers is audiovisually recorded and then annotated for speaker and speech/silence segments. L...

متن کامل

Automatic social role recognition and its application in structuring multiparty interactions

Automatic processing of multiparty interactions is a research domain with important applications in content browsing, summarization and information retrieval. In recent years, several works have been devoted to find regular patterns which speakers exhibit in a multiparty interaction also known as social roles. Most of the research in literature has generally focused on recognition of scenario s...

متن کامل

Floor holder detection and end of speaker turn prediction in meetings

We propose a novel fully automatic framework to detect which meeting participant is currently holding the conversational floor and when the current speaker turn is going to finish. Two sets of experiments were conducted on a large collection of multiparty conversations: the AMI meeting corpus. Unsupervised speaker turn detection was performed by post-processing the speaker diarization and the s...

متن کامل

Cross-Modal Supervision for Learning Active Speaker Detection in Video

In this paper, we show how to use audio to supervise the learning of active speaker detection in video. Voice Activity Detection (VAD) guides the learning of the vision-based classifier in a weakly supervised manner. The classifier uses spatio-temporal features to encode upper body motion facial expressions and gesticulations associated with speaking. We further improve a generic model for acti...

متن کامل

Spatio-Temporal Analysis of Spontaneous Speech with Microphone Arrays

Accurate detection, localization and tracking of multiple moving speakers permits a wide spectrum of applications. Techniques are required that are versatile, robust to environmental variations, and not constraining for non-technical end-users. Based on distant recording of spontaneous multiparty conversations, this thesis focuses on the use of microphone arrays to address the question “Who spo...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Vision-based Active Speaker Detection in Multiparty Interactions

نویسندگان

چکیده

منابع مشابه

Towards Speaker Detection using Lips Movements for Human-Machine Multiparty Dialogue

Automatic social role recognition and its application in structuring multiparty interactions

Floor holder detection and end of speaker turn prediction in meetings

Cross-Modal Supervision for Learning Active Speaker Detection in Video

Spatio-Temporal Analysis of Spontaneous Speech with Microphone Arrays

عنوان ژورنال:

اشتراک گذاری